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Abstract 

Background: Random monoallelic expression contributes to phenotypic variation of cells and organisms. However, 
the epigenetic mechanisms by which individual alleles are randomly selected for expression are not known. Taking 
cues from chromatin signatures at imprinted gene loci such as the insulin-like growth factor 2 gene 2 {IGF2), we 
evaluated the contribution of CTCF, a zinc finger protein required for parent-of-origin-specific expression of the 
IGF2 gene, as well as a role for allele-specific association with DNA methylation, histone modification and RNA 
polymerase II. 

Results: Using array-based chromatin immunoprecipitation, we identified 293 genomic loci that are associated 
with both CTCF and histone H3 trimethylated at lysine 9 (H3K9me3). A comparison of their genomic positions with 
those of previously published monoallelically expressed genes revealed no significant overlap between allele- 
specifically expressed genes and colocalized CTCF/H3K9me3. To analyze the contributions of CTCF and H3K9me3 
to gene regulation in more detail, we focused on the monoallelically expressed IGF2BP1 gene. In vitro binding 
assays using the CTCF target motif at the IGF2BP1 gene, as well as allele-specific analysis of cytosine methylation 
and CTCF binding, revealed that CTCF does not regulate mono- or biallelic IGF2BP1 expression. Surprisingly, we 
found that RNA polymerase II is detected on both the maternal and paternal alleles in B lymphoblasts that express 
IGF2BP1 primarily from one allele. Thus, allele-specific control of RNA polymerase II elongation regulates the allelic 
bias of IGF2BP1 gene expression. 

Conclusions: Colocalization of CTCF and H3K9me3 does not represent a reliable chromatin signature indicative of 
monoallelic expression. Moreover, association of individual alleles with both active (H3K4me3) and silent 
(H3K27me3) chromatin modifications (allelic bivalent chromatin) or with RNA polymerase II also fails to identify 
monoallelically expressed gene loci. The selection of individual alleles for expression occurs in part during 
transcription elongation. 



Background 

Allele-specific gene expression is an integral component 
of cellular programming and development and contri- 
butes to the diversity of cellular phenotypes [1,2]. Allelic 
differences in gene expression are mediated by either 
parent-of-origin-specific selection (imprinting) or sto- 
chastic selection of alleles for activation and/or silen- 
cing. The importance of genomic imprinting has 
recently been highlighted by RNA sequencing studies 
that demonstrated widespread allelic differences in gene 
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expression in mouse brain affecting more than 1,300 
genes [3]. The extent of sex- and stage-specific expres- 
sion of individual alleles emphasizes the essential role of 
allelic transcriptional regulation in development. In 
addition to the extensive occurrence of imprinted par- 
ent-of-origin-specific expression, gene expression pat- 
terns of clonal cell populations are also modified by 
random or stochastic silencing of either the maternal or 
paternal allele. Well-known loci displaying allele-specific 
expression include odorant receptor genes, immunoglo- 
bulins and various receptor proteins [4-6]. Additionally, 
previous large-scale studies have provided new data 
demonstrating that parent-of-origin-specific expression 
is employed much more frequently than previously 
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thought [7]. These new findings illustrate the scale and 
complexity of genomic allele-specific expression. How- 
ever, the precise molecular mechanism underlying the 
allelic bias in gene expression is not very well 
understood. 

The best-characterized locus with strict monoallelic 
imprinted gene expression is the region containing the 
insulin-like growth factor 2 (IGF2) and H19 genes [8]. 
The regulation of this locus relies on the imprinting 
control region (ICR), which acquires DNA methylation 
on the paternal allele during normal development of the 
male germline. Methylation of cytosines at the ICR inhi- 
bits binding of the zinc finger protein CTCF to the 
paternal allele, preventing its role as an insulator and 
allowing long-range interactions of the IGF2 promoter 
with enhancer elements downstream of the HI 9 gene 
[9-11]. In contrast, the unmethylated ICR on the mater- 
nal allele recruits CTCF, effectively preventing promo- 
ter-enhancer interactions and maintaining repression of 
the maternal IGF2 gene. 

The well-documented requirement of CTCF for 
imprinted expression at the IGF2/H19 gene locus is 
thought to result from its role in establishing and/or 
maintaining long-distance interactions between regula- 
tory elements [12]. Allele-specific binding of CTCF to 
the ICR has long been known to be essential for the for- 
mation of chromatin loops. While the precise mechan- 
ism of CTCF's role in long-distance chromatin 
interactions remains unknown, several studies have pro- 
vided a rationale for the differential expression of the 
maternal and paternal IGF2 gene by revealing an inter- 
action of CTCF with cohesin, a protein complex known 
for its requirement during sister chromatid cohesion in 
mitosis [13-16]. Chromosome conformation capture 
experiments in combination with RNA interference 
assays recently confirmed the CTCF and cohesin-depen- 
dent formation of higher-order chromatin structures at 
the IGF2IH19 and other gene loci [17-19]. 

In addition to DNA methylation, histone modifications 
also contribute to the maintenance of allele-specific 
expression. DNA methylation of ICRs is accompanied 
by repressive histone markers, including histone H3 tri- 
methylated at lysine 9 (H3K9me3). In contrast, the 
unmethylated allele is characterized by permissive his- 
tone markers, including histone H3 trimethylated at 
lysine 4 [20]. Colocalization of epigenetic markers 
including DNA methylation and histone H3 dimethy- 
lated at lysine 9 has been exploited to identify epigeneti- 
cally distinct parental alleles. Chromosomal regions 
displaying overlaps of euchromatin and heterochroma- 
tin-specific markers have been enriched for known 
imprinted genes [21]. 

Despite the importance of monoallelic expression in 
cellular development and differentiation, little is known 



about the establishment and maintenance of random 
monoallelic expression. The link between allele-specific 
binding of CTCF and monoallelic expression of the 
IGF2 gene prompted us to test whether the presence of 
CTCF and H3K9me3 specifies a chromatin arrangement 
which demarcates random monoallelically expressed 
alleles. Using array-based chromatin immunoprecipita- 
tion (ChlP-chip), we identified 293 loci displaying these 
chromatin markers. We selected the IGF2BP1 gene 
locus to further examine whether the presence of CTCF 
and H3K9me3 comprises a necessary chromatin 
arrangement for a specific expression profile analogous 
to the monoallelic behavior observed at the IGF2/H19 
locus. Surprisingly, colocalization of CTCF and 
H3K9me3 does not provide a reliable measure of mono- 
allelic binding of CTCF at the IGF2BP1 gene. Our stu- 
dies included allele-specific sequencing of 
immunoprecipitated chromatin to demonstrate that 
chromatin at each IGF2BP1 allele is bivalent. Impor- 
tantly, both alleles recruit RNA polymerase II, suggest- 
ing that silencing of one IGF2BP1 allele occurs after 
transcription initiation. By establishing which epigenetic 
configurations are involved in governing monoallelic 
gene expression, we will broaden the understanding of 
epigenetic mechanisms as they relate to cancer progres- 
sion and cellular differentiation. 

Results 

Colocalization of CTCF and H3K9me3 in the human 
genome 

Allele-specific binding of CTCF to the ICR regulates 
parent-of-origin-specific expression of the IGF2 gene 
and correlates with differential cytosine methylation and 
the presence of H3K9me3 [9-11]. We carried out a 
large-scale survey to identify genomic sites with chroma- 
tin markers similar to those at the ICR of the IGF2/H19 
locus. Using ChlP-chip, we identified CTCF binding 
sites by tiling through the nonrepetitive portion of the 
genome in 100-bp intervals. Genomic sites bound by 
CTCF were assembled on a condensed array set that 
tiled through 9,823 sites using overlapping probes, and 
replicate ChIP experiments were performed. By using 
conservative criteria (positive signal in three replicates; P 
< 0.05) in this analysis, we identified 8,462 loci that 
interact with CTCF. To identify the subset of sites that 
associate with both CTCF and H3K9me3, we tested the 
association of these 8,462 loci with H3K9me3 using the 
condensed DNA array set. These analyses revealed 293 
loci that are both bound by CTCF and marked by 
H3K9me3 (Table SI in Additional file 1) (distances of 
CTCF and H3K9me3 peaks < 500 bp). Of the 293 loci, 
115 directly mapped to coding regions. Of the remain- 
ing loci (174 of 293), the majority (147 loci) were 
located in intergenic regions at a distance > 10 kb to 



Thomas et al. Epigenetics & Chromatin 201 1, 4:14 
http://www.epigeneticsandchromatin.eom/content/4/1/14 



Page 3 of 16 



the nearest 5' end of known genes. Only 27 loci mapped 
to promoter regions. Overall, 40% of the CTCF/ 
H3K9me3 loci mapped to intergenic regions, 51% 
mapped to intragenic domains and 9% mapped to pro- 
moter regions, a distribution similar to that of the 8,462 
CTCF loci (44%, 51% and 10% respectively). Notably, 
the CTCF-regulated IGF2/H19 locus is included in the 
subset of 293 loci (Figure SI in Additional file 2), sug- 
gesting that our experimental approach may be useful 
for the identification of similarly expressed genes. 

IGF2BP1 alleles are stochastically expressed in human B 
cells 

Genes classified as "monoallelically expressed" encom- 
pass both imprinted genes, such as the IGF2 gene, 
where monoallelic expression is regulated in a parent- 
of-origin-specific manner, and stochastic loci, where 
individual alleles are randomly selected for expression 
independent of parental origin. In recent studies in 
which allele-specific transcription was assessed in several 
human cell lines, more than 300 (7.5%) of 4,000 human 
genes examined were subject to random monoallelic 
expression, with a majority of the latter being capable of 
biallelic expression [7]. 

To examine whether CTCF binding at sites marked by 
H3K9me3 is indicative of monoallelic expression, we 
first compared the genomic positions of our 293 loci 
with the list of genes expressed in a random allele-speci- 
fic manner. Only a small number of genes (8 of 293 
loci) were common to both the monoallelically 
expressed cohort described by Gimelbrant et al. [7] and 
our CTCF/H3K9me3 set of ChlP-chip binding loci. 

To further examine the correlation between CTCF/ 
H3K9me3 and monoallelic expression, we selected 12 
genes located near one of the 293 CTCF/H3K9me3 sites 
(DIAPH1, FUS1, PKP1, ARFGAP2, PCDHGA, MTHFR, 
LAIR1, GPR3, ARMET, NPR1, NHLRC1 and IGF2BP1) 
to search lymphoblastoid cell lines (LCLs) derived from 
a pedigree from the Center d'Etude du Polymorphisme 
Humaine (CEPH) for SNPs in exonic and 3'-UTR 
regions. The monoclonality of LCLs was confirmed by 
analysis of their immunoglobulin heavy chain (IgH) 
gene rearrangement (Figure S2 in Additional file 2) [22]. 
Sequencing of genomic DNA (gDNA) and cDNA of 
LCLs identified the insulin-like growth factor binding 
protein gene IGF2BP1 as the only candidate gene 
expressed from only one allele (Table S2 in Additional 
file 1). IGF2BP1 is an RNA-binding protein that regu- 
lates transcript stability and translation of the imprinted 
IGF2 gene [23]. In addition, IGF2BP1 binds to H19, 
MYC and p-TrCPl mRNA to regulate message half-life, 
localization and translation of RNA, suggesting that the 
regulation of IGF2BP1 expression may affect disease and 
development [24,25]. We focused on IGF2BP1 to 



examine the contribution of CTCF and H3K9me3 mar- 
kers colocalized at intron 5 to allele-specific expression 
(Figure 1). 

Sequencing of gDNA identified 10 individuals that 
were heterozygous at SNP rsl 1655950 in the 3'-UTR of 
IGF2BP1 (Figure 2A). All heterozygous SNPs were sub- 
sequently typed in cDNA. A comparison of the tran- 
scriptome-derived genotypes to genomic genotypes 
indicated that six individuals expressed IGF2BP1 pri- 
marily from only one allele. In contrast, four individuals 
were found to express both IGF2BP1 alleles (Figure 2A). 
SNP determination for genomic and cDNA for CEPH 
family 1331 was confirmed by allelic discrimination 
assays based on fluorogenic probes (TaqMan allelic dis- 
crimination assay; Applied Biosystems, Foster City, CA, 
USA), which yielded identical results (Figure S3 in Addi- 
tional file 2). The TaqMan allelic discrimination assay, a 
real-time PCR based approach, yields a scatterplot of 
genotypes capable of quantitatively detecting a range of 
1:1 and 1:5 ratios of individual alleles in DNA mixtures 
at SNP rsll655950 (Figure S4 in Additional file 2). Indi- 
viduals GM7033 and GM6989 were found to express 
the paternally inherited IGF2BP1 allele, while GM7030 
and GM7005 were found to express the maternally 
inherited allele (Figure 2). Individuals GM7007 and 
GM7016 also exhibited monoallelic expression of 
IGF2BP1, but we were unable to identify the mode of 
expression because of the limited pedigree. These data 
indicate that monoallelic expression at the IGF2BP1 
gene locus is not determined by parent-of-origin mark- 
ings; instead, it is defined by stochastic choice. 

CTCF binds to its target motif at the IGF2BP1 locus 
independently of DNA methylation 

Binding of CTCF to its target motifs at both the human 
and mouse ICR of the IGF2/H19 locus is sensitive to 
DNA methylation [10,26]. To test whether monoallelic 
expression of IGF2BP1 in some individuals is also regu- 
lated by monoallelic DNA methylation of CTCF binding 
motifs, we examined a role for CpG methylation and 
allele-specific binding of CTCF at this locus. 

To precisely determine the DNA sequence required 
for CTCF binding at the IGF2BP1 locus, we searched 
for potential motifs using SOMBRERO [27], a de novo 
motif-finding algorithm that uses multiple self-organiz- 
ing maps (SOM) to cluster sequences of a specific 
length (reads) from a set of input sequences (such as 
enriched genomic loci identified by ChlP-chip experi- 
ments). Motif alignment using STAMP [28] and com- 
parison to the JASPAR transcription factor database [29] 
identified a distinct cohort of 68 motif models, all of 
which were identical to the canonical CTCF motif pre- 
viously reported (Figure S5 in Additional file 2) [30]. 
The clustered reads associated with all 68 motif models 
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Figure 1 Colocalization of CTCF and H3K9me3 at the IGF2BP1 locus. Array-based chromatin immunoprecipitation (ChlP-chip) data for both 
CTCF and histone H3 trimethylated at lysine 9 (H3K9me3) identify candidate loci for analysis of monoallelic expression. (A) Depiction of the 
IGF2BP1 gene with specific SNPs examined in this study (arrows). (B) Close-up portion of the locus with tracks for CTCF enrichment (top track) 
and H3K9me3 association (bottom track) near SNP site rs1 1870560. The ChlP-chip data are displayed using the UCSC Genome Browser. DNA 
derived from CTCF ChIP experiments was analyzed by using microarrays with hybridization probes spaced 100 bp apart. The higher resolution of 
the H3K9me3 ChlP-chip data is due to the use of condensed array sets that tiled through all of the CTCF-positive regions with probes 
overlapping each other by 12 nt. 



were mapped back to sequences enriched in our ChlP- 
chip analysis and were displayed using the UCSC Gen- 
ome Browser (Figure S6 in Additional file 2). Using this 
approach, we identified 28,713 peaks, each composed of 
multiple overlapping reads, within the original 8,462 
ChlP-chip loci. Using a strategy similar to that used to 
study ChlP-seq clustering [31], our frequency analysis of 
these peak heights yielded a bimodal distribution with 
an evident power law at low peak heights deviating to a 
clear excess in the numbers of peaks with heights > 10 
(Figure S7 in Additional file 2). We consequently parti- 
tioned the peak populations into low-confidence and 
high-confidence groups using the peak height threshold 
of 10 (Figure S8 in Additional file 2). 

Using this approach, we identified three potential 
motifs (X, Y and Z) (Figure 3) within the 350-bp region 
of the IGF2BP1 gene locus enriched in our ChlP-chip 
experiments. Two of the putative binding sites, Y and Z, 
accumulated a significant number of matches to motif 
models. However, only one of the three putative CTCF 
binding sites belongs to the group of high-confidence 
binding sites (site Y) (Figure 3). In support of our in 
silico analysis of CTCF binding, previously published 
high-resolution ChlP-seq data on CTCF binding 
revealed enrichment of sequences surrounding motifs Y 



and Z (Figure 3A), suggesting that either one or both 
motifs is required for CTCF recruitment. 

To further define the contribution of motifs Y and Z 
to CTCF binding, we measured their ability to recruit 
CTCF in vitro using immobilized template assays (Fig- 
ures 3B and 3C). Wild-type and mutant DNA templates 
containing either one or both motifs were linked to 
magnetic beads, incubated with nuclear extract, washed 
and tested for association with CTCF by performing 
Western blot analysis. A 105-bp template containing the 
wild-type IGF2BP1 intronic sequence efficiently 
recruited CTCF (Y wt 105-bp template) (Figure 3B). In 
contrast, CTCF binding was severely reduced when the 
putative CTCF motif Y was mutated by four base substi- 
tutions (Figure 3C and Figure S9 in Additional file 2). 
To test the contribution of the adjacent motif Z to 
CTCF binding at the IGF2BP1 locus, we generated sev- 
eral 125-bp DNA templates that encompassed both 
CTCF target motifs (Figure 3B). Targeted mutations at 
specific positions of motif Y and/or motif Z were intro- 
duced to test the contribution of each motif to recruit- 
ment of CTCF. Detailed sequences are shown in Figure 
S9 in Additional file 2. As shown in Figure 3C, the 125- 
bp template recruited CTCF more efficiently than the 
105-bp template. However, motif Z does not contribute 
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Figure 2 Analysis of allele-specific IGF2BP1 expression. Comparative analysis of sequence variations in B lymphoblasts of the CEPH pedigree 
family 1331 reveals monoallelic expression of the IGF2BP1 gene. (A) Pedigree analysis was carried out for the SNP site rs1 1655950 located in the 
3'-UTR of the IGF2BP1 gene. Each individual is shown with CEPH family identification, sample identification and genetic information (SNP 
genomic DNA (gDNA) genotype- or transcript-derived genotype). Individuals with monoallelic IGF2BP1 gene expression are indicated by asterisks. 
If the individual is homozygous at the SNP, allele-specific expression cannot be defined. (B) Left: Genotyping results at rs1 1655950 with gDNA 
from members of CEPH family 1331. gDNA was analyzed using the TaqMan SNP Genotyping Assay. This assay discriminates between sequence 
variants using two allele-specific probes carrying two different fluorophores, VIC and FAM. Individuals coded in red and green represent cell lines 
that are homozygous for alleles A and G, respectively. Orange-labeled individuals contain both A and G alleles at SNP rs1 1655950 and represent 
informative cell lines used for further analysis of monoallelic expression. Diamonds indicate cDNA samples, and black x indicates averaged 
triplicates of a no-template control (NTC) near the origin of the graph. Right: Genotyping results of transcript-derived cDNA from heterozygous B 
lymphoblasts. Individuals are color-coded in the figure key. No-RT controls (No RT) from cDNA synthesis are shown near the origin of the graph 
and are indicated by a black X. Control samples (standards) of stem cell lines previously genotyped as homozygous AA, heterozygous AG and 
homozygous GG were plotted and are indicated by diamonds. 



to CTCF recruitment, since targeted mutations in motif 
Z do not influence the level of CTCF binding. Consis- 
tent with this notion, CTCF binding is undetectable in 
the absence of a wild-type motif Y (Figure 3C). 

CTCF binding site Y at the IGF2BP1 gene contains a 
single CpG residue adjacent to the 14-bp core sequence of 
CTCF (Figure 4A). To establish whether binding of CTCF 
to Y^ is inhibited by cytosine methylation, we tested Y^ 
105-bp immobilized templates after in vitro methylation of 
cytosine residues by CpG methyltransferase M.Sssl. For 



comparison, we examined CTCF motifs containing a 
higher CpG content, including site A of the MYC gene 
[32] as well as the Bl sequence of the ICR of the human 
IGF2IH19 locus [10]. Cytosine methylation at the human 
Bl sequence is known to inhibit binding of CTCF. Consis- 
tent with this, recruitment of CTCF in vitro to immobi- 
lized templates containing the Bl sequence or the MYC 
site A is highly sensitive to DNA methylation (Figure 4B, 
top). In contrast, CpG methylation of the Y^ motif has no 
effect on CTCF recruitment. Replacement of the Y wt core 
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Figure 3 Functional CTCF sequence motifs at the intronic region of the IGF2BP1 gene. (A) UCSC Genome Browser display of relative 
positions of high- and low-confidence CTCF target motifs, ChlP-chip, ChIP sequencing (ChlP-seq) and ChIP self-organizing maps results. (B) Y wt 
105-bp and Y wt Z wt 125-bp templates employed in the immobilized template assay. Detailed sequences of the templates are shown in Figure S9 
in Additional file 2. (C) Western blot analysis of CTCF recruitment to Y wt 105-bp and Y wt Z wt 125-bp templates containing combinations of wild- 
type and mutated CTCF target sequences. Motif Y is sufficient for recruitment of CTCF. 



motif by the CTCF-binding sites of the chicken FII insula- 
tor element yields similar results. However, CTCF binding 
becomes sensitive to CpG methylation upon modification 
of the core motif to the mouse R3 sequence, a homologue 



of the human Bl sequence. In combination, despite the 
presence of a methylable CpG residue, binding of CTCF 
to the Y wt sequence of the IGF2BP1 gene in vitro is not 
sensitive to CpG methylation. 
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Figure 4 Cytosine methylation of the CTCF core motif Y does not influence binding of CTCF. (A) CTCF motifs used in the context of the 
105-bp immobilized template derived from the intronic region of the IGF2BP1 gene are shown. The position frequency matrix of the CTCF 
target motif is shown at the top. Only the sense strand of the motifs is shown. CpG residues are indicated by filled black circles. Myc-A, IGF2 
huB1 and Y wt are CTCF target sequences derived from MYQ IGF2 and IGF2BP1 gene loci. Y mut chFII and Y mut mmR3 contain the CTCF target 
sequence of the chicken HS4 insulator [57] and the CTCF target region of the mouse imprinting control region R3 [10]. (B) Top: control 
experiments revealed the sensitivity of CTCF binding to DNA methylation (CpG me) at the myc-A and IGF2 huB1 templates. Bottom: methylation 
of the 105-bp Y wt template did not affect the recruitment of CTCF. While methylated chicken Fll CTCF target sites efficiently recruited CTCF, CpG 
methylation of the mouse R3 sequence decreased the binding of CTCF. 
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To confirm that our in vitro characterization of CTCF 
binding accurately reflected the in vivo association of 
CTCF with the IGF2BP1 locus, we evaluated the methy- 
lation status of the CTCF motif and adjacent CpG resi- 
dues in the IGF2BP1 intronic region in both biallelically 
(GM7057) and monoallelically (GM6989) expressing 
cells by using bisulfite sequencing (Figure 5). The 
methylation levels were calculated using BiQ Analyzer 
software [33]. Our data reveal that the CpG residue at 
the 5' end of the CTCF binding motif Y is invariably 
methylated. In addition, other methylable residues in 
this region exhibited some degree of DNA methylation. 
To further confirm binding of CTCF to methylated 
IGF2BP1 intronic sequences, we bisulfite-sequenced 
DNA derived from immunoprecipitates of ChIP experi- 
ments with CTCF antibodies. As a control, we bisulfite- 
sequenced the IGF2BP1 region derived from anti- 
H3K9me3 ChIP experiments. The results confirmed our 



in vitro finding that demonstrated an association of 
CTCF with a methylated motif (Figures 5B and 5C). 

CTCF and H3K9me3 colocalize at both the maternal and 
paternal IGF2BP1 alleles 

Consistent methylation of the CTCF-binding motif in 
IGF2BP1 indicated that DNA methylation is not allele- 
specific. To directly determine whether CTCF is bound 
monoallelically, we determined the allele-specific asso- 
ciation of both CTCF and H3K9me3 by sequencing 
DNA recovered from ChIP experiments. We first identi- 
fied informative cell lines by genotyping individuals 
from CEPH pedigree 1331 at SNP sites located close to 
the CTCF binding site. Cell lines derived from both 
monoallelically (GM7016 and GM6989) and biallelically 
(GM7057) expressing individuals were heterozygous at 
SNP site rsl 1870560 at the CTCF site (Figure 6A). We 
first applied the allelic discrimination assay to serial 
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Figure 5 DNA methylation analysis of the IGF2BP1 CTCF binding region. Analysis of DNA methylation with bisulfite sequencing at the 
intronic CTCF binding region of the IGF2BP1 gene is shown. (A) The percentage of methylation of CpG sites in gDNA derived from cell lines 
that express IGF2BP1 from only one allele (GM7016, GM6989) or from both alleles (GM7057) is shown. The CpG residue located within the CTCF 
binding motif is invariably methylated and is indicated by the thick black bar located adjacent to CpG site 7 (indicated by asterisks). (B) The 
percentage of methylation at each CpG site of the IGF2BP1 CTCF site in DNA samples recovered from anti-H3K9me3 ChIP. (C) The percentage of 
methylation at each CpG site of the IGF2BP1 CTCF site in DNA samples recovered from anti-CTCF ChIP experiments. The level of DNA 
methylation is represented according to the heat map keys located at the bottom of the figure. 
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Figure 6 Allelic specificity of CTCF and H3K9me3. Informative ChIP templates were analyzed using the TaqMan allelic discrimination assay to 
address the allelic association of CTCF and H3K9me3. (A) Genotyping results at rs1 1870560 identify informative cell lines useful for the detection 
of allele-specific association of CTCF and H3K9me3. gDNA obtained from monoallelic and biallelic cell lines were genotyped using the TaqMan 
allelic discrimination assay. Squares represent gDNA samples and are coded in red and green to represent cell lines that are homozygous for 
allele C and allele T, respectively. Orange indicates heterozygous individuals. Averaged triplicate of a no-template control (NTC) is shown near 
the origin of the graph. (B) Genotyping at SNP rs1 1870560 with DNA templates recovered from ChIP experiments was used to identify the 
enrichment of the two alleles with either CTCF (circle) or H3K9me3 (triangle). Each color shown in the figure key represents a lymphoblastoid 
cell line (LCL) derived from an individual of the pedigree, while the shape represents the source of each sample (for example, squares signify 
input samples, while circles and triangles indicate ChIP samples obtained with CTCF and H3K9me3 antibodies, respectively). Immunoprecipitated 
templates were generated using the ChIP protocol described in Materials and Methods. Both monoallelic and biallelic cell lines indicate biallelic 
distribution of both CTCF and H3K9me3. Diamonds indicate control LCL samples (standards) previously genotyped as homozygous CC, 
heterozygous CT and homozygous TT. 



dilutions of known homozygotes of the two possible 
alleles to test its ability to quantitatively assess the con- 
tribution of each allele in a DNA mixture. This assay 
provides quantitative results with high sensitivity and 
reproducibility within a ten-fold range of DNA concen- 
trations, thus making it a useful tool for allelic discrimi- 
nation of immunoprecipitated DNA (Figure S4 in 
Additional file 2). We used two monoallelically 
(GM7016 and GM6989) and one biallelically (GM7057) 
expressing cell lines to genotype DNA recovered from 
ChIP assays using either anti-CTCF or anti-H3K9me3 
antibodies. Each analysis was performed in triplicate. 
Equal proportions of the two sequence variants were 
detected in DNA derived from ChIP assays with either 
H3K9me3 or CTCF antibodies, indicating that CTCF 
associates with both the maternal and paternal alleles 
(Figure 6B). Thus, monoallelic expression of the 
IGF2BP1 gene is not mediated through monoallelic 
binding of CTCF. 

The IGF2BP1 promoter associates with both active and 
silent histone modifications in B cells 

To define alternative mechanisms responsible for ran- 
dom monoallelic expression of IGF2BP1, we sought to 
identify markers that distinguish the active and inactive 
alleles. K27-trimethylated and K4-trimethylated histone 



H3, respectively, mark transcriptionally silent and active 
chromatin. We determined the relative enrichment of 
these two histone markers at the IGF2BP1 promoter for 
each allele in both monoallelically and biallelically 
expressing cell lines using ChIP with anti-H3K4me3 and 
anti-H3K27me3 antibodies. Both H3K4me3 and 
H3K27me3 were detected at the IGF2BP1 gene promo- 
ter (Figure 7A). To determine whether any of the his- 
tone modifications selectively associates with either 
allele, we again searched for informative sequence SNPs 
at the IGF2BP1 promoter region in the CEPH pedigree. 
Cell lines derived from individuals GM6989 (monoalleli- 
cally expressing cell line) and 7057 (biallelically expres- 
sing cell line) were heterozygous at SNP rs9890278 
located upstream of the transcription initiation site, 
whereas GM7007 (monoallelically expressing cell line) 
was heterozygous for SNP rs4794017 located 1 kb 
downstream of the transcription initiation site. To 
address whether active and silent alleles in these cell 
lines are distinguished by specific histone markers, we 
sequenced SNPs rs9890278 and rs4794017 in gDNA 
recovered from ChIP experiments using anti-H3K4me3 
and anti-H3K27me3 antibodies. The results revealed 
that both H3K4me3 and H3K27me3 are detected on 
both alleles in a bivalent fashion (Figure 7). In combina- 
tion, our results indicate that both active and silent 
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Figure 7 IGF2BP1 promoter region is enriched with activating and silencing chromatin modifications. DNA recovered from ChIP 
experiments using anti-H3K4me3, anti-H3K27me3 and RNA polymerase II ChIP templates was genotyped by sequencing the IGF2BP1 promoter 
region containing sequence variant rs4794017 or rs9890278. Left: Enrichment of H3K4me3 (K4) and H3K27me3 (K27) in monoallelically (GM7007, 
GM6989), and biallelically (GM7057) expressing cell lines. The positions of informative SNPs rs479017 and rs9890278 are shown in Figure 1. Both 
activating and silencing marks are significantly enriched. Right: Sequences enriched by ChIP were excised and sequenced. The results show an 
association of both alleles with active and silent histone modifications at the IGF2BP1 promoter region independent of transcriptional status. 



histone markers (H3K4me3 and H3K27me3) coexist in 
the promoter region of both IGF2BP1 alleles in monoal- 
lelically as well as biallelically expressing cell lines. 
These data indicate that allele-specific expression of 
IGF2BP1 cannot be explained by differential association 
of active and silent histone markers. 

Silencing of the inactive IGF2BP1 allele by inhibition of 
RNA polymerase II elongation 

Monoallelic expression of IGF2BP1 cannot be attributed 
solely to selective activation or silencing of one allele 
through histone modifications, since H3K4me3 as well 
as H3K27me3 are detected at both alleles. H3K4me3 is 
typically associated with transcriptionally active alleles, 
raising the question whether allele-specific transcription 
elongation or RNA processing accounts for monoallelic 
expression of the IGF2BP1 gene. To address this 
hypothesis, we again searched mono- and biallelically 
expressing cell lines for sequence SNPs near the site of 
transcription initiation at the IGF2BP1 promoter. 
Within CEPH pedigree 1331, only line GM7007 con- 
tained a heterozygous genotype at SNP site rs4794017 
located within intron 1, 1 kb downstream of the tran- 
scription initiation site. We performed RNA polymerase 
II ChIP on chromatin prepared from this monoallelically 
expressing line. Quantitative real-time PCR analyses 
revealed enrichment of IGF2BP1 promoter sequences 



similar to the enrichment observed at the MYC promo- 
ter. Immunoprecipitated DNA was PCR-amplified and 
sequenced (Figure 8A). Identification of both sequence 
variants at rs4794017 in DNA recovered from ChIP 
experiments indicates that RNA polymerase II is asso- 
ciated with both IGF2BP1 alleles, which is consistent 
with the presence of H3K4me3 at the promoter of both 
alleles. 

These data suggest that allele specificity of transcrip- 
tion is achieved after recruitment of RNA polymerase to 
both alleles, such as through transcriptional pausing 
and/or selective RNA processing. A major rate-limiting 
step in transcription elongation is pausing of RNA poly- 
merase II in the promoter proximal region immediately 
downstream of the transcription initiation site [34-37]. 
We sequenced the 5' portion of the IGF2BP1 gene of all 
monoallelically expressing cell lines to identify sequence 
variants that would be useful for allelic identification of 
promoter proximal regions occupied by RNA polymer- 
ase II or for the determination of the allelic origin of 
unspliced, precursor pre-mRNA transcripts. Since no 
additional informative sequence variants were identified, 
we focused on the detection and sequencing of pre- 
mRNA transcripts about 1 kb downstream of the tran- 
scription initiation site in GM7007. Using the informa- 
tive SNPs located within intron 1 of this gene, we 
targeted nascent unspliced RNA with primers designed 
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Figure 8 RNA polymerase II associates with both alleles in a monoallelically expressing cell line. (A) Recruitment of RNA polymerase II to 
the IGF2BP1 promoter was examined by ChIP in monoallelically expressing GM7007 cells. DNA recovered from chromatin that had been 
immunoprecipitated with anti-RNA polymerase II antibodies (Pol2) was amplified and sequenced for allelic association. Sequencing results 
(bottom) reveal that both alleles of the monoallelically expressing cell line GM7007 associate with RNA polymerase II near SNP site rs4794017. In 
contrast, sequencing of DNA from "no antibody" ChIP reactions failed to produce sequence reads. (B) Allele specificity of precursor mRNA was 
determined by sequencing of cDNA prepared from total RNA of GM7007 cells. RNA had been extensively pretreated with DNase I to eliminate 
gDNA prior to reverse transcription by RT. Subsequently, cDNA samples were amplified using primers flanking rs4794017. In the absence of RT 
(-RT), no amplification products were oberved. +RT amplicons were gel-purified and sequenced. Bottom: Sequence traces at the heterozygous 
SNP site rs4794017 located 1 kb downstream of the transcription initiation site in cDNA of GM 7007 indicate a single allele. 



to amplify a region containing SNP site rs4794017. To 
avoid detection of gDNA in RNA samples, DNA was 
efficiently removed by treatment with an engineered, 
highly active form of DNase I (TURBO DNase I; 
Applied Biosystems/Ambion, Austin, TX, USA). This 
protocol allowed detection of pre-mRNA free of gDNA 
contamination (Figure 8B). Sequencing of amplified 



IGF2BP1 pre-cDNA revealed only one of the two 
sequence variants at SNP rs4794017, indicating that pre- 
mRNA transcripts are transcribed from only one allele 
despite the presence of RNA polymerase II on both 
alleles. Thus, our data indicate that monoallelic expres- 
sion of the IGF2BP1 gene is regulated through allele- 
specific transcriptional elongation prior to SNP site 
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rs4794017, located approximately 600 bp downstream of 
the first intron splice site. 

Discussion 

Allele-specific expression in which one parental allele is 
stochastically or parent-of-origin-specifically silenced is 
widespread in mammalian organisms. Large-scale, allele- 
specific gene expression analyses have revealed that 5% 
to 10% of autosomal genes show random monoallelic 
transcription [7]. The stability of allele-specific expres- 
sion through many cell passages suggests that epigenetic 
modifications maintain this specific type of gene regula- 
tion throughout generations of cells. Analogously to the 
regulation at the imprinted IGF2/H19 locus, we tested 
the hypothesis whether monoallelic binding of CTCF, a 
characteristic marker for the IGF2/H19 ICR, also under- 
lies random monoallelic expression. Using ChlP-chip 
analyses, we identified chromosomal loci that are 
enriched in both CTCF and H3K9me3 and cross-corre- 
lated their positions with previously published lists of 
monoallelically expressed genes. Our data indicate that 
genomic loci enriched for both CTCF and H3K9me3 do 
not significantly correlate with monoallelically expressed 
genes. While this lack of correlation could be formally 
attributed to variations in monoallelic expression 
between different cell lines and types, it should be noted 
that the genome-wide pattern of CTCF binding is very 
consistent between different cell lineages [30,38,39]. 
Thus, if CTCF and H3K9me3 contribute to allele-speci- 
fic expression, it should be detectable through allele-spe- 
cific association of CTCF and H3K9me3. Focusing on 
the IGF2BP1 gene, we tested whether monoallelic 
expression in a pedigree of LCLs correlates with mono- 
allelic binding of CTCF. Although binding of CTCF to 
its targets is thought to be sensitive to DNA methyla- 
tion, we surprisingly found the cytosine residue closely 
flanking the CTCF target motif at the IGF2BP1 gene to 
be consistently methylated without any effect on CTCF 
recruitment. Indeed, our in vitro analyses of the binding 
requirements using immobilized templates confirmed 
that methylation of cytosine residues within the 
IGF2BP1 sequence does not affect CTCF binding. These 
data are consistent with those in previous studies in 
which researchers found that cytosine methylation out- 
side the CTCF core motif did not affect the binding affi- 
nity of bacterially expressed wild-type and mutant CTCF 
proteins [40]. This information is useful for the identifi- 
cation of the genomic subset of CTCF sites that might 
contribute to differential cell- and stage-specific expres- 
sion due to their sensitivity to cytosine methylation, 
potentially mediating changes in large-scale chromatin 
organization during development and disease. 

A number of studies have examined the correlation of 
allele-specific expression with allele-specific association 



of epigenetic markers [21,41-45]. The data produced by 
these studies have established common signatures of 
imprinted alleles, including H3K9me3 and H3K4me3, 
providing a powerful means by which to identify novel 
imprinted or monoallelically expressed loci [46-48]. In 
contrast to the strict allele-specific association of DNA 
methylation and chromatin markers at imprinted genes, 
histone modifications at the nonimprinted, monoalleli- 
cally expressed IGF2BP1 gene do not predict the active 
allele. Both H3K4me3 and H3K27me3, markers charac- 
teristic of active and inactive loci, are associated with 
each allele, as both sequence variants of SNP rs4794017 
are present in the DNA of heterozygous individuals 
recovered from ChIP experiments. Moreover, loading of 
RNA polymerase II also does not provide a reliable mar- 
ker for identifying the transcribed allele. Our ChIP 
experiments identified both sequence variants at SNP 
rs4794017 within the promoter proximal region of anti- 
RNA polymerase II immunoprecipitated DNA. Because 
only one LCL in our study was informative for determin- 
ing an association of RNA polymerase II at the IGF2BP1 
alleles, we could not define how frequently this type of 
regulation occurs within cell lineages and throughout the 
genome. However, other investigators have reported 
similar results at the PCNA gene. Maynard et al. [44] 
found that both PCNA alleles in IMR90 cells are bound 
by RNA polymerase II, although only one allele generates 
full-length mRNA. Together, these data suggest that 
transcription elongation not only is a general rate-limit- 
ing step in the transcription of the vast majority of genes 
[34,35,37] but also regulates the expression of a subset of 
monoallelically expressed genes. 

The expression of IGF2BP1 in differentiated cell types, 
including LCLs, is significantly lower than in embryonic 
stem cells. In an attempt to determine whether allele- 
specific expression also contributes to IGF2BP1 regula- 
tion early in development, we genotyped both gDNA 
and cDNA in 11 human embryonic stem cell (hESC) 
lines. However, while only three hESC lines were infor- 
mative (heterozygous at SNP rsl 1655950), all three 
expressed IGF2BP1 in a biallelic manner. Although the 
number of available and informative hESC lines is not 
sufficient to clearly define a role for allele-specific elon- 
gation in early developmental stages, we believe that it 
is unlikely that this mechanism is restricted to cell types 
with low levels of IGF2BP1 expression. Control of tran- 
scriptional activity through promoter proximal pausing 
or premature termination of transcription is not 
restricted to specific gene classes characterized by low 
levels of transcriptional activity [35]. We speculate that 
distinct positioning of the homologous alleles within the 
nuclear space and association with distinct "transcrip- 
tion factories" may contribute to monoallelic transcrip- 
tion elongation. 
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The IGF2BP1 gene is highly expressed during embryo- 
nic development and is required for the regulation of 
mRNA stability of several genes involved in growth reg- 
ulation, including the IGF2, p-catenin and MYC genes 
[23-25]. Consistent with its role in early developmental 
stages, the IGF2BP1 gene is downregulated in differen- 
tiated cell types, and overexpression of IGF2BP1 is 
known to occur in multiple human cancers, including 
breast, lung and colon [49-52]. Thus, changes in the 
level of IGF2BP1 expression through silencing of only 
one allele could provide a safeguard against pathogenesis 
and disease. 

Conclusions 

Allele-specific gene expression is common in the human 
genome and is thought to contribute to phenotypic varia- 
tion. The allele-specific association of CTCF, H3K9me3 
and DNA methylation is a characteristic marker of 
imprinted gene expression at the IGF2/H19 locus, raising 
the question whether these epigenetic markers are useful 
for identifying both imprinted and random monoallelically 
expressed genes throughout the genome. In this study, we 
have demonstrated that colocalization of CTCF and 
H3K9me3 does not represent a reliable chromatin signa- 
ture indicative of monoallelic expression. In addition, we 
conclude that allele-specific binding of CTCF requires 
methylation of very specific cytosine residues within the 
target motif, effectively limiting the number of CTCF bind- 
ing sites potentially affected by allele-specific binding. In 
addition, the active and inactive alleles of random monoal- 
lelically expressed genes do not necessarily correlate with 
active or inactive histone markers. Remarkably, the selec- 
tion of individual alleles for expression at the IGF2BP1 
locus occurs during early stages of transcription elongation. 

Methods 

ChlP-chip analyses 

The amplification and preparation of immunoprecipi- 
tated DNA derived from HBL100 cells for hybridization 
to ENCODE arrays (Roche NimbleGen Inc., Madison, 
WI, USA) was performed essentially as described pre- 
viously [53]. Sample labeling and array hybridization 
were performed at NimbleGen Systems Inc. Genomic 
control DNA was labeled with Cy3, and sample DNA 
was labeled with Cy5. Both Cy3- and Cy5-labeled DNA 
were hybridized to high-density arrays tiling through 
ENCODE regions with 50-mer oligonucleotides across 
nonrepetitive genomic regions. The ratios of the Cy3 
and Cy5 intensities of each probe were calculated using 
NimbleGen Systems' proprietary software. 

Peak detection and false-positive rate calculation 

A genomic sequence was considered a possible CTCF- 
binding site if there were at least four probes among the 



sequence probe and the flanking probes within a win- 
dow covering 250 bp on both sides of the probe had 
log 2 ratio values above a specified cutoff value. The cut- 
off value was calculated separately for each chromo- 
some. The cutoff value is a given percentage of the 
value (mean + 6 x standard deviation) of the log 2 ratio 
values of all the probes covering the chromosome. The 
possible binding sites thus detected are called peaks. To 
calculate the false-positive rate (FPR) by data permuta- 
tion, the log 2 ratio values among probes were scrambled 
to generate a randomized data set for each individual 
chromosome. Multiple repetitions of this process gener- 
ated 20 randomized data sets for each chromosome. 
Subsequently, the peak detection algorithm described 
above was applied to count the average number of 
peaks in the 20 randomized data sets using the same 
cutoff. The ratio of that number to the number of peaks 
from the nonrandomized data set is the FPR. The FPR 
is associated with the threshold setting, which is indi- 
cated by the value of cutoff P. Peak detection and ran- 
domization of data sets were repeated for different 
threshold settings of P. The corresponding FPRs were 
calculated and assigned to peaks. The FPR value 
assigned to the individual peaks is the value associated 
with the cutoff P at which the peak is first detected. 

Peak discovery was performed using chromatin immu- 
noprecipitate:input ratios combined from adjacent oligo- 
nucleotides within 250-bp regions. The FPR of detection 
was estimated by permutation analyses in which the 
experimentally determined log 2 ratio values were reas- 
signed to probes in a random fashion, allowing selection 
of stringency and specificity levels. To define sites of 
CTCF interaction with high confidence, peaks were 
required to be present in all three biological replicates 
and to be generated at a FPR < 0.05. 

Chromatin immunoprecipitation 

Chromatin was prepared for immunoprecipitation as 
described previously [54] by cross-linking the cells in 1% 
formaldehyde for 5 minutes and subjecting them to sub- 
sequent sonication until the bulk of DNA was 300 to 
600 bp in size. Chromatin corresponding to 2 x 10 7 
cells was immunoprecipitated with anti-CTCF antibody 
(D31H2; Cell Signaling Technology, Danvers, MA, 
USA), anti-H3K9me3 antibody (ab8898; Abeam, Cam- 
bridge, MA, USA), anti-trimethyl K4-histone H3 anti- 
body (ab8580; Abeam), anti-trimethyl K27-histone H3 
antibody (Millipore 07-449, Billerica MA, USA) or anti- 
RNA polymerase II antibody (sc899; Santa Cruz Bio- 
technology, Santa Cruz, CA, USA). Immunoprecipitates 
were washed, the DNA protein cross-links were reversed 
and the recovered DNA was tested by performing con- 
ventional quantitative PCR as described previously [54]. 
RNA polymerase II ChIP experiments were performed 
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using the Matrix ChIP protocol [55]. Sequences of pri- 
mers specific for the gene loci under study as well as 
the reference primers are available upon request. 

RNA extraction and RT-PCR 

Synthesis of cDNA was carried out according to the 
manufacturer's instructions (Qiagen, Valencia, CA, 
USA) using 1 (ig of total RNA. For detection of pre- 
mRNA, RNA preparations were pretreated with TURBO 
DNase I (Ambion/ Applied Biosystems) as described in 
the manufacturer's protocol. RT was carried out at 37°C 
for one hour. 

Cell culture 

Cell lines were cultured in RPMI 1640 medium supple- 
mented with 10% FCS, 2 mM L-glutamine and the anti- 
biotics penicillin (50 U/mL) and streptomycin. 

Sodium bisulfite conversions 

gDNA was treated with sodium bisulfite using the EZ 
DNA Methylation Kit (Zymo Research, Orange, CA, 
USA) according to the manufacturer's instructions. PCR 
amplification of bisulfite-treated DNA was performed 
using ZymoTaq DNA Polymerase (Zymo Research Cor- 
poration, Irvine, CA, USA) and conversion-specific pri- 
mers targeted to the IGF2BP1 CTCF region (forward 
primer: 5'-TATTTTTTAGTTGGGTTAAT-TGGTG-3', 
reverse primer: 5'-ATACTACCTCTCCTTCCAAA 
ATCTC-3'). The amplified products were purified by gel 
electrophoresis and sequenced. Each case was scored as 
methylated or unmethylated, and the percentage of 
methylation was calculated using BiQ Analyzer software 
[33]. 

TaqMan allelic discrimination assays 

TaqMan allelic discrimination assays were performed 
according to the manufacturer's instructions with the 
following adjustments: cDNA from B lymphoblasts was 
preamplified for 14 cycles. PCR products were gel-puri- 
fied and subsequently used as templates in the genotyp- 
ing of samples. The specific primer sequences used are 
available upon request. 

In vitro CTCF binding analysis using immobilized 
templates 

Crude nuclear extract was prepared from 1 x 10 9 Jurkat 
cells grown in growth media (RPMI 1640 with 10% fetal 
bovine serum) according to methods described pre- 
viously [56]. Biotinylated template DNA was generated 
by PCR amplification of the IGF2BP1 intronic region 
using a biotinylated/nonbiotinylated primer combina- 
tion. The specific primer sequences are available upon 
request. For each binding reaction, 1 pM biotinylated 
DNA template was coupled to 50- (ig streptavidin-linked 



magnetic beads (Dynabeads M-280 Streptavidin; Invitro- 
gen, Carlsbad, CA, USA). Templates immobilized to 
magnetic beads were washed three times in B&W buffer 
(5 mM Tris, pH 7.5, 0.5 mM ethylenediaminetetraacetic 
acid (EDTA), 1 M NaCl) and resuspended in Jurkat 
nuclear extract. After a two-hour incubation at 4°C, 
immobilized templates were washed three times in 
Dignam buffer D (20 mM 4-(2-hydroxyethyl)-l-piperazi- 
neethanesulfonic acid, pH 7.9, 20% glycerol, 0.1 M KC1, 
1 mM EDTA, 0.1 mM ethylene glycol tetraacetic acid, 
1% Nonidet P-40, 1 mM dithiothreitol) containing pro- 
tease inhibitor (P8340; Sigma, St Louis, MO, USA). To 
recover template-bound proteins, beads were incubated 
in elution buffer (5 mM Tris, pH 7.5, 0.5 mM EDTA, 1 
M NaHC0 3 ) including protease inhibitors. After a 5- 
minute incubation, the eluate was removed and trans- 
ferred into a fresh tube. The presence of CTCF in the 
eluate was determined using standard Western blot ana- 
lysis protocols. 

Additional material 



Additional file 1: Table SI. Genomic coordinates of 293 genomic 
sites that are marked by both CTCF and H2K9me3 Table S2. List of 
genes tested for monoallelic expression in lymphoblastoid cell lines. 

Additional file 2: Figure SI. Detection and colocalization of CTCF 
and H3K9me3 at the human IGF2-H19 ICR locus by ChlP-chip 
experiments. Top: Enrichment of CTCF binding sites. Middle: Results of 
large-scale array-based chromatin immunoprecipitation (ChlP-chip) survey 
of histone H3 trimethylated at lysine 9 (H3K9me3) binding. Bottom: H19 
exons demonstrating positions of CTCF binding and histone 
modifications relative to exons. Figure S2. Analysis of the clonal status 
of lymphoblastoid cell lines used in this study. Following the 
protocol described in [22], PCR amplification of two regions within the 
variable segment in the immunoglobulin heavy chain gene (conserved 
framework region 2 (Fr2) and the variable joining regions (VUH)) reveals 
the clonal status of lymphoblastoid cell lines (LCLs). The amplification 
product from a polyclonal population (P) gives rise to fragments of 
varying length due to the large number of rearranged immunoglobulin 
genes and appears as a broad band. Amplification of DNA derived from 
monoclonal cell lines results in one or two discrete bands within an 
expected size range of 240 to 280 bp. The polyclonal sample (P) was 
obtained from the peripheral blood of a healthy donor. Lanes 1 through 
4: monoclonal cell lines GM7007, GM7033, GM6989 and GM7030. Lanes 5 
through 8: monoclonal lines GM7050, GM7023, GM7059 and GM7057. 
MW, DNA size marker. Figure S3. Sequencing results give results 
identical to those derived from the TaqMan allelic discrimination 
assay. (A) Standard sequencing results of two individuals at SNP site 
rs9904288. (B) TaqMan allelic discrimination assay confirms the 
heterozygosity of GM7057 and the homozygosity of GM6990. Figure S4. 
Quantitative assessment of TaqMan genotyping using specific 
probe set at SNP rs1 1655950. The 3-UTR of the IGF2BP1 gene was 
amplified using primers given in Supplemental Table 2. This segment 
contains an A/G SNP. The PCRs included a FAM-labeled probe for the A 
allele and a VIC-labeled probe for the B allele. After PCR amplification, an 
end point fluorescence reading was taken on the ABI PRISM 7700 with 
SDS version 1.4 software (Applied Biosystems). The determination of the 
quantitative assignment of known genotypes is plotted. Concentration 
dilutions were created using known homozygous cell lines. Preparations 
of gDNA samples shown represent the following allele B/allele A ratios: 
100:0, 80:20, 60:40, 50:50, 40:20, 20:80 and 0:100. Heterozygosity was 
based on the fluorescence intensity of FAM, VIC or both dyes together. 
Error bars indicate 5% of triplicate sample value. Allele A curve yields y = 
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0.01 02x + 0.0415 with R 2 = 0.98934. Allele B curve yields y = -0.0085* + 
0.9796 with R 2 = 0.98196 Figure S5. Phylogenetic tree of motifs 
determined from motif analysis of the 8,462 loci derived from the 
ChlP-chip analysis using STAMP. All members of the highlighted 
group have matches identical to the canonical CTCF motif model as part 
of the JASPAR transcription factor binding site database. The resulting 
familial binding profile for all 68 such models is displayed. Figure S6. 
Fine mapping of CTCF motifs in sequences enriched in ChlP-chip 
experiments. Motif reads were mapped onto the genomic loci defined 
by ChlP-chip for CTCF binding. The extent of the ChlP-enriched 
sequences is indicated by red bar. Several read clusters are apparent and 
vary in depth and spatial extent (green areas). Figure S7. Frequency 
distribution of cluster depth for all motif clusters. A power law is 
apparent for clusters of depth < 10 with evident deviation in the 
population and a maximum of about 40. The vertical green line 
demarcates the low and high confidence clusters. Figure S8. 
Discrimination between high- and low-confidence sites. The region 
shown in Supplemental Figure S6 is annotated by overlaying enriched 
sequences with high- and low-confidence tracks. Figure S9. Sequences 
of immobilized templates used in in vitro binding experiments. 
CTCF core motifs Y and Z are underlined. Site-specific mutations in either 
the Y or Z motif are highlighted in yellow. In Y mut chFII and Y mut mmR3, 
site-specific mutations (highlighted in green) were introduced to 
generate CTCF motifs identical to the chicken HS4 Fll site and the mouse 
imprinting control region R3. The IGF2 wild-type huB1 sequence is 
derived from the human IGF2 imprinting control region containing the 
methylation-sensitive CTCF binding site B1. 
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